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(57) Abstract: The invention refers to a method of 
automatic recognition of musical compositions and 
sound signals, which is used for the identification 
of musical compositions and sound signals played 
by radio or TV, or performed in public places. Ac- 
cording to this invention, there is a selection of a de- 
sirably large number of musical compositions and 
sound signals, which we want to identify. In every 
one of these signals an original procedure is applied 
leading to the extraction of a set of characteristics 
which will finally represent a model signal. Subse- 
quently, for the implementation of the recognition, 
the unknown musical composition or sound signal 
is received and digitised. To its digitised version 
the same procedure of extracting a set of charac- 
teristics is applied. These are compared with the 
corresponding sets of the model signals and with 
original criteria it is decided if there is a model sig- 
nal that corresponds to the unknown signal under 
consideration. Moreover, it is decided which model 
signal exactly corresponds to the unknown one. 
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Method of Automatic Recognition of Musical Compositions and 

Sound Signals 

This invention refers to a method of automatic recognition of musical 
5 compositions and sound signals and it is used in order to identify musical 
compositions and sound signals transmitted by radio, TV and/or performed in public 
places. 

During the past, efforts for the development of methods for the automatic 
recognition of musical compositions and sound agnals have been made, that led to 

10 the creation of systems performing this task. However, these methods and the 
related systems manifest low percemage of successful recognition both for the 
musical compositions and for the sound signals of interest. The introduced method 
offers much better percentage of fully automatic recognition, higher or equal than 
ninety eight percent (98%). 

15 According to this invention, there is a selection of a desirably high number of 

musical compositions and sound signals, which we want to identify. For easy 
reference we will refer to these compositions and signals with the term model 
signals. In every one of these signals an original procedure is applied leading to the 
extraction of a set of characteristics which will finally represent each model signal. 

20 Subsequently, for the implementetion of the recognition, the unknown musical 
composition or sound signal is received, in which the same procedure of extracting a 
corresponding set of characteristics is applied. These characteristics are compared 
with the corresponding sets of characteristics of the model signals and, by means of 
a number original criteria, it is decided if one (and which one exactly) of the model 

25 signals corresponds to the unknown signal under consideration. This procedure is 

described in figure 1 . 

It is stressed that, offidally, there is no reference in international bibliography 
for a similar method or a relative system. In the worid market there are very few 
similar systems which offer a percentage of successful recognition less than sbcty 
30 percent (60%). 

The invention is described more thoroughly below: 

First, the whole fi-equency band from 0 to 1 1025 Hz is divided to sub-bands 
that are dmost exponentially distributed. An implementation of such a division 
presented in Table 1 . 

35 According to this implementation, the whole fi-equency band firom 0 to 1 1025 

Hz is divided in 60 sub-bands. 

Subsequently, each model signal is digitised with a random sampling fi-equency 
Fs preferably greater than or equal to 11025 Hz and a window of 8192 or 16384 
or 32768 sample length, sUdes on the obtained digitised signal. In every such 

40 window, an adaptive Fast Fourier Transform is applied and the Discrete Fourier 
Transform absolute value is obtained. Next, the fi-equency domain window is divided 
in sections according to the aforementioned frequency sub-bands choice (see Table 
1) and then, in every such section, aU the peaks of the absolute value of the Fourier 
transform are spotted and the greater one is obtained. The value of this peak is 

45 called "section representauve". Then the L "representatives" with the greater values 
are spotted, where the value of L may vary from 13 to 30, while the most frequently 
used L value is 20. The indicators of the sections corresponding to these 
representatives, sorted in increasing order, form a vector, which constitutes the 
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"representative-vector'' of the window. The above procedure is repeated while the 
window slides on the whole digitised model signal thus creating all the 
representative vectors for the specific model signal. Notice that, while the window 
slides on the model signal, the generated representative vectors often remain 

5 unchanged in two successive windows, successive in the sense that they start in two 
positions differing one sample the one firom the other. For this reason, in every 
representative vector we assign a number indicating the number of subsequent 
windows in which the specific vector remained unchanged. For that number we will 
use the name "number of repetitions" of the representative vector. For the set of the 

10 generated representative vectors of each model signal we will use the name "the 
model signal set of representatives". The aforementioned procedure is described in 
figure 2. 

For the identification of the unknown sound signal, which from now on will be 
called the "unknown signal", the following procedure is used: 
15 A part of the unknown signal of length varying fi-om eight (8) to sixteen (16) 

seconds is received, digitised and registered, at least temporarily. At the beginning of 
that unknown signal part a window of length Wg = 8192 or =16834 or 
Wg = 32768 samples is obtained; notice that in any case this window will be of the 
same length with the sliding window which was used for the model signals. In this 
20 window a Fast Fourier Transform is applied and the absolute value of it is obtained. 
Afterwards, all the peaks of the absolute value of the Fourier transform are spotted 
and S copies of these peaks are created. For the creation of every copy of the peaks, 
the positions of the peaks are multiplied with a different coefficient 7) , i = 0, 1, 
S, which is called "window shift coefficient". Thus, S+1 different groups of peaks 
25 are created. For every one of these groups the following procedure is realised: the 
section to which each peak corresponds according to the aforementioned frequency 
sub-bands division, is spotted (see Table 1). For every section to which at least one 
peak corresponds, the greater peak is kept. The value of this peak is called 
"representative of the section of the unknown signal corresponding to the shift 
30 coeflicient //". 

Next, the L greater value representatives are spotted, where the value of L is 
the same with the one used for the model signals. The indicators of the sections 
corresponding to these representatives, sorted in increasing order, form a vector, 
which constitutes the "first representative vector of the unknown signal 
35 corresponding to the shift coefficient // 

Afterwards, the window slides for samples, where the value of may 
vary fi-om 0,55 * to 1,9* samples, with most frequently used value the 
^ J = 1,4 * . For the new window position and for every shift coefficient , i = 
0, 1,.. .,S, (S+1) vectors are computed with the way that described above; each such 
40 vector will be called "second representative vector of the unknown signal 
corresponding to the shift coefficient ". The above procedure is repeated for M-2 
windows, where each window starts at a sample having a distance of samples 
from the start of the previous one, i = 2, 3, M-1, where the value of M may 
fluctuate between 7 and 13 windows, the most usual value being M = 9. In this way 
45 S+l groups of M representative vectors are obtained; for each such group we will 
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employ the name "group of unknown signal representative vectors corresponding to 
the shift coefficient 

It must be stressed that for a specific application the values, i == 1, 2,. . M- 
1, are not necessary equal, but must be kept fixed throughout the whole procedure. 
The exact number (S+1) of the shift coefficients // varies fi-om 1 to 15, while their 
values are given by the formula: 



\^(l±}iySTEP, if i odd 
2 

\'{i/2)*STEP, if i even 



where STEP is a parameter expressing the shift step, that usually belongs to the 
interval [0.005, 0.01], the more frequently used value being 0.0075. The 
10 identification procedure described so far is depicted in figure 3 

For the realisation of the unknown signal recognition, each group of unknown 
signal representatives is being compared with elements of the set of representatives 
of each model signal separately. To set ideas, each of the S+1 groups of M unknown 
signal representatives is compared with groups of M model signal representatives by 
1 5 means of the method consisting of the following steps: 

El) If the first representative vector of one group of the unknown signal is 
called Vi and the first representative vector of the of the model signal is called Uj , 
then initially, the number of the common elements between these two vectors is 
calculated. For example, if L = 20 and 



Vi =[60 55 52 49 47 43 39 34 33 30 29 22 2017 14 11 9 5 2 l] 

Ui=[60 58 55 49 47 41 39 37 33 30 28 25 2017 14 11 9 6 4 2] 
then the number of the common elements is thirteen (13). 

25 Subsequently, it is checked if the number of the common elements between the 

vectors and Uj is greater than or equal to the number 0.51*L, which is called 
"requisite similarity threshold". If, indeed, it is greater than or equal to 0.5 1*L, we 
proceed to step E, below. If it is smaUer than 0.51*L, then we consider that the set 
of the tests performed so far did not result to a successfiil recognition, so, after 

30 considering Uj as the next representative-vector of the model signal, we start the 
comparison procedure again, beginning from the comparison of the vector Vj with 
the new Uj . 

Ez) If the second representative vector of the unknown signal, corresponding 
to the same shift coefficient with // , is called Vj and the representative vector 
35 of the model signal corresponding to the sample {ii^fX is called Uj, then we 
calculate the number of the common elements between these two vectors. 
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Afterwards, we check if the number of the conmion elements between the 
vectors V2 and U2 is greater than or equal to the "requisite similarity threshold". 
If it is greater or equal, we proceed to step £3 below. If it is smaller, then we 
consider that the set of tests performed so far did not result to a successful 
5 recognition, so, after considering as the next representative- vector of the model 
signal, the comparison procedure starts again beginning from the comparison of the 
vector V| with the new Uj . 



10 • 

E<M-i)) If the (M-1) representative vector of the unknown signal corresponding 
to the same with Vj shift coefficient // , is called V^m-I) the representative 

M-2 

vector of the model signal corresponding to the sample ^i^^* fi) is called 

U(M-i) y then we calculate the number of the common elements between these two 
15 vectors. 

Next, we check if the number of the common elements between the vectors 
%M-l) is greater than or equal to the "requisite similarity threshold". 

If it is greater or equal, we proceed to step Em below. If it is smaller, then we 
consider that the set of tests performed so far did not result to a successful 
20 recognition, so, after considering Uj as the next representative-vector of the model 
signal, the comparison procedure starts again beginning from the comparison of the 
vector Vj with the new Uj. 

Em) If the M representative vector of the unknown signal corresponding to the 
same with Vj shift coefficient fj , is called Vjvi and the representative vector of the 

A/-1 

25 model signal corresponding to the sample ]^(^^ * />) is called Ujvi , then we 

calculate the number of the common elements between these two vectors Vj^j and 
Ujvi and we check if it is greater than or equal to the "requisite similarity 
threshold". If it is greater or equal, we proceed to step Em+i below. If it is smaller, 
then we consider that the set of tests performed so far did not result to a successfiil 

30 recognition, so, after considering as the next representative- vector of the model 
signal, the checking procedure starts again beginning from the comparison of the 
vector Vj with the new Uj . 

Em+O First we check how many of the pairs (V^jUj), (VjjUj),.-, 
(Vjv|,Uivi) have, according to the previous comparisons, a number of common 

35 elements in the interval [0.5 1 *Z,, 0.71 If the number of these pairs is greater than 
0.34*M, then we consider that the set of tests performed so far did not result to a 
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successful recognition, so, after considering as the next representative-vector of 
the model signal, the comparison procedure starts again beginning from the 
comparison of the vector Vi with the new Uj. If the number of these pairs is 
smaller or equal than 034*M then the following check is realised: 

5 For the pairs of the vectors pairs (VijUj), (V2,U2), (Vi^,Ujvi), 

having already being compared, we calculate the mean value of the number of the 
common elements. If this mean value is greater than or equal to 0.71*1 then we 
consider that the comparison between the group of the M representatives of the 
model signal corresponding to the shift coefficient that we checked and the 

10 group of the representatives of the unknown signal, is successful. If the mean value 
is smaller than 0,71*1 then we consider that the set of tests performed so far did not 
result to a successful recognition, so, after considering as Uj the next 
representative-vector of the model signal, the comparison procedure starts again 
beginning from the comparison of the vector Vj with the new Uj . 

15 If all possible vectors of the model signal are unsuccessfully compared with 

one group of representatives of the unknown signal corresponding to the specific 
shift coefficient // , then we repeat the comparison procedure, using the group of 
representatives of the unknowm signal corresponding to the next shift coefficient 
fj^^ . If the comparison of a specific set of model vectors with all (S+1) groups of 

20 representatives of the unknown signal is unsuccessful, then we proceed to the 
comparison of the unknown signal with another set of model vectors. 

If the resuh of the above comparison is successful for a group of the unknown 
signal corresponding to a specific shift coefficient, let's say we proceed to the 
application of the irrevocable comparison criterion, which v^U be described below. 

25 As it is already mentioned, the successful application of the first 

aforementioned criterion resuhs to the determination of a group of M 
representatives of the model signal Up Uj^-mUm which 'iit" to the group of 
the representatives of the unknown signal Vj, Vj, -mVj^ corresponding to the 
specific shift coefficient Z^.. Since the positions of these vectors in their 

30 corresponding signals are now known, it is possible to realise a sequence of 
comparisons between vectors of the unknown signal, corresponding to the specific 
shift coefficient with the vectors of the model signal formed at the specific 

positions where the first criterion was satisfied. 

In this way, in the digitised unknown signal of duration from eight (8) to 

35 sixteen (16) seconds, a window of length is obtained beginning at the unknown 
signal starting point. In this window a fast Fourier transform is applied again and its 
absolute value is obtained. Subsequently, the peaks of the Fourier transform are 
spotted and their positions are multiplied with the shift coefficient which has 
been previously verified that satisfies the first criterion. Then, in each section, the 

40 peaks are sorted according to their value. In each section to which at least one peak 
has been previously ascribed, the greater peak is kept to form the "representative 
vector of the unknown signal". Next, the L greater value representatives are spotted, 
where the value of L is the same with the one used in the first criterion. The 



wo 01/04870 



6 



PCT/GROO/00024 



indicators of the sections corresponding to these representatives, sorted in increasing 
order, form a vector that constitutes the ^first irrevocable representative-vector of 
the unknown signal". 

Then, the window slides for ki samples, where the value of ki is equal to 

5 ^1 *(M -1) ^^^^ fluctuates between 30 and 50. For the new window 
D-l 

position a new vector is calculated, the same way as it was described before, called 
"second irrevocable representative-vector of the unknown signal". The above 
procedure is repeated for over Z)-2 windows, each one startmg at a distance of /r, 

samples from the start of its previous wmdow, where /c, - ' ' ~ 

10 ...,Z)-1. 

In this way, finally, a group consisting of D representative-vectors is created. 
We will refer to this group with the name "irrevocable group of representatives of 
the unknown signal". 

In order to obtain the final decision if the unknown signal corresponds to the 
15 model signal in hand, the irrevocable group of representatives of the unknown signal 
is compared to elements of the set of the representatives of the model signal, by 
means of a method similar to the first criterion consisting of the steps briefly 
described below: 

Ti) If the first irrevocable representative-vector of the unknown signal is called 
20 Vi and Uj is called the representative-vector of the model signal corresponding to 
the position, let's say where the first criterion has been satisfied, then initially we 
calculate the number of the common elements between these two vectors. 

Tj) If the second irrevocable representative-vector of the unknown signal is 
called , then this vector is compared with vector U 2 , which is the representative 
25 vector of the model signal corresponding to the position hx-^ k^* f^, where is 
the shift coefficient that has been calculated from the first criterion. 



30 T<D-i)) If the (DAf" irrevocable representative-vector of the unknown signal is 

called V(*D_i) and the representative-vector of the model signal corresponding to 

the sample ^.ikj * ) is called U(o-i), then we calculate the number of the 
7=1 

common elements between these two vectors. 

Finally, having calculated the number of the common elements between these 
35 D pairs of vectors, in order to decide for the identification, we check if the two 
conditions stated below are satisfied: 

[Condition 1] At least 0.825 * D from the pairs of the vectors, have common 
number of elements greater than 0.71 * L, 
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[Condition 2] The total number of the common elements of the vectors, 
namely the sum of the common elements of the pairs 
(V;,U;),(V2*,U;),...,(Vi_,),U;D_i))), is greater than 0.6875 *D*L. 

If these two conditions are satisfied, then we have successfully recognised that 
the specific musical composition corresponds to the model signal in hand. 

The whole procedure of the identification is described in the Figures 3, 4 and 

5. 
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The method for the automatic recognition of musical compositions and sound 
signals, which is used for the identification of musical compositions and sound 

5 signals played by radio or TV or performed in public places, is based on the 
existence of a procedure, which is applied to the model signals and results to the 
extraction of a set of characteristics, which will finally represent each model signal. 
Besides, it is based on a similar procedure, which applies to the unknown musical 
composition or sound signal for the extraction of similar characteristics and, finally, 

10 it is based on a procedure of comparison performed between the representative sets 
of characteristics of the model and the unknown signal. This method is characterised 
by the model sets of characteristics corresponding to division of the fi-equency 
domain in bands. It is also characterised by two original criteria for the decision of 
the identification, according to which, two musical compositions or sound signals 

1 5 are identified only when: 

a) A group ofM representative vectors of the model signal Uj , --mUivi 

where two successive vectors are calculated at samples having distance ^, , / = 1, 2, 
. . M-1, "match" with a group of representatives of the unknown signal Vj , Vj, 
. . Vm , which corresponds to a specific shift coefficient . Notice that, the values 

20 of , /■ = 1, 2, . . A/-1, are not necessarily equal, but , in any case, are kept fixed 
throughout the application. The matching between Ui, U2, .-mUjvi and Vi , V2, 
Vjyj is realised by means of the following criterion: 
All comparisons between the vectors of the pairs (Vi,U|), C^i^^l)-* 
( Vjyj , ) are made and the number of pairs with common elements in the interval 

25 [0.51 * 0.71 * L\ is computed. If it is greater than 0.34 * M then we consider that 
the set of comparisons performed so far did not result to a successfiil recognition. If 
this number is smaller or equal than 0.34 * M then it is checked if the mean value of 
the number of common elements of vectors of the above pairs (V|,Ui), 
( V2,U2), ( Vm^Uivi) is greater than or equal to 0.71 * L If it is, then we 

30 consider that the comparison between the group of the M representatives, 
corresponding to the shift coefficient f^, of the model signal in hand and the group 
of representatives of the unknown signal is successfiil. 

b) A second group of Z) irrevocable representative-vectors of the model signal 

, TJ2 , . . . , Uj) being calculated at a distance ki the one fi-om its previous, where 

35 ki ~— — i - L 2, ...i)-l, which are not necessarily equal, but are, m 

D-1 

any case, kept fixed throughout the application, "match" with a group of 
representatives of the unknown signal , V2, .. .,V0 which corresponds to a 
specific shift coefficient according to the following criterion: 
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• At least 0.825*Z) from the pairs of the vectors 
(Vi,U;),(V2,U2),...,(V^_1),UJd-i)), have conunon number of elements 

greater than 0.71 * L. 

• The total number of the common elements of the vectors (namely the one 
5 that results from the summation of the common elements of the pairs 

(V;,U;),(V;,U;),...,(V(;_,),U*^_,))), is greater than 0.6875 *D*L. 

If both these conditions (a) and (b) are satisfied, then we have successfully 
recognised the specific musical composition. 
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